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ABSTRACT 

A  brief  survey  of  several  graphical  multivariate  techniques  are  given. 
Andrews'  method  is  exploited  as  a  graphical  tool  for  the  examination  of 
changes  over  time  in  the  parameters  of  a  time  series  model.  An  example  is 
given  to  illustrate  the  method 
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SIGNIFICANCE  AND  EXPLANATION 


A  brief  survey  of  several  graphical  multivariate  techniques  are  given. 

One  of  these  due  to  Andrews  is  given  in  more  detail.  In  his  method,  Andrews 
represents  each  multidimensional  point  by  a  Fourier  function.  The  clustering 
of  plots  of  these  functions  is  equivalent  to  the  clustering  of  the  multi¬ 
dimensional  points.  Andrews'  method  is  exploited  as  a  graphical  tool  for  the 
examination  of  changes  over  time  in  the  parameters  of  a  time  series  model.  An 
example  consisting  of  temperature  data  is  given  to  illustrate  the  method. 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  author  of  this  report. 


ANDREWS'  PLOTS  AND  THEIR  APPLICATIONS 


Agnes  M.  Herzberg* 


1«  Introduction. 

If  multivariate  data  are  m-dimensional ,  then  each  set  of  m  measurements 
can  be  represented  as  an  m-dimensional  point.  For  m  =  1,2,  these  points  may 
be  plotted  and  clusters  easily  determined  by  inspection.  For  m  >  2  ,  this  is 
more  difficult.  Several  authors  have  developed  graphical  techniques  to  plot 
high-dimensional  data  in  two  dimensions  in  order  to  be  able  to  visually 
cluster  the  data;  see,  for  example,  Andrews  (1972),  Chernoff  (1973),  Kleiner 
and  Hartigan  (1981)  and  Anderson  (1928,  1936).  More  mathematical  techniques 
have  been  given  by  Beale  (1969)  and  Banfield  and  Bassil  (1977). 

2.  Several  graphical  methods. 

Kleiner  and  Hartigan  (1981)  introduced  what  they  termed  trees  and  castles. 

First,  a  hierarchical  clustering  algorithm  is  applied  to  the  m  variables 
over  all  the  points;  see  for  example  Gnanadesikan  (1977).  From  this  the 
structure  of  the  tree  or  castle  will  be  determined.  All  points  will  be 
represented  by  a  similar  structure,  i.e.  the  thickness,  position  and  angle  of 
the  branches  in  the  case  of  trees  will  be  the  same,  but  the  length  of  the 
branches  will  be  determined  by  the  sizes  of  the  respective  variables  for  the 
individual  points.  Similar  trees  and  castles  determined  by  visual  inspection 
are  clustered. 
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Chernoff  (1973)  represents  each  variable  as  a  feature  of  a  face  ,  for 
example  length  of  mouth,  shape  of  mouth,  size  of  eyes,  etc.  The  resulting 
clustering  of  this  representation  is  very  subjective  because  different  people 
focus  on  different  features  of  faces. 

Anderson  (1928,  1936)  was  very  concerned  with  the  sepal  length  and  width 
and  petal  length  and  width  of  irises.  He  developed  pictorial  methods  which  he 
called  ideographs  for  representing  and  comparing  these  four-dimensional 
data.  An  ideograph  looks  like  an  upside-down  U  with  some  width.  In  the 
case  of  the  iris  measurements,  the  inside  and  outside  height  and  width 
measurements  of  the  ideograph  are  proportional  to  the  sepal  length  and  width 
and  the  petal  length  and  width,  respectively.  Similar  ideographs  can  easily 
be  clustered  by  visual  inspection. 

There  are  many  other  graphical  representations  for  multivariate  data,  but 
these  will  not  be  discussed. 

3.  Andrews'  plots. 

Andrews  (1972)  proposed  the  following  simple  and  useful  method  of 
plotting  high-dimensional  data  in  two  dimensions.  If  the  data  are  m- 
dimensional,  each  point  x’  =  (x,,»»»,x  ),  where  x  (i  =  1,»*»,m)  are  the 
measured  variables,  is  represented  by  the  function 

-  Vo 

f  (t)  =  x.2  2 +  x„  sin  t  +  x„  cos  t  +  x,  sin  2t  +  x_  cos  2t  +  • • •  (1) 

x  1  2  3  4  5 

plotted  over  the  range  -n  <  t  <  it  .  The  functions  given  bv  ( 1 )  have  several 

properties.  If  x.  =  (x,.,»*»,x  .)  (i  =  1,*»*,n)  are  n  points  in  m- 
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dimensional  space,  then 
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Thus  Andrews'  plots  will  preserve  means,  distances  and  variances  and  will  also 
give  one-dimensional  projections.  When  (1)  is  plotted  for  each  data  point 
x  ,  the  clustering  of  the  points  may  be  seen  by  a  banding  together  of  the 
plots  of  the  functions.  Since  the  functions  preserve  the  distance  property, 
plots  of  the  functions  that  are  close  together  imply  that  the  corresponding 
data  points  are  close  together. 


4.  Variation  of  model  parameters. 

Herzberg  and  Hickie  (1981)  considered  the  following.  Let  the  regression 
model  be  written  in  the  form 

Y-i  =  x  +  <3  =  1 ,  •  •  • , T-  n  +  1),  (2) 

where  T  is  the  total  number  of  observations  n  is  the  number  of 

observations  in  each  subgroup  of  observations  used  for  estimating  the  unknown 

parameters,  Y.  =  (y,.,»»*,y  ,)*  is  an  n  x  1  vector,  y.  .  being  the  kth 
-3  lj  nj 

observation  in  the  subgroup  (k  =  1,»»»,n),  X  is  the  n  x  m  matrix  of 

the  regressors,  8.  is  the  m  x  1  vector  of  unknown  parameters  and 

U .  is  the  n  x  1  vector  of  error  terms.  All  the  elements  of  the  U  's  are 

assumed  to  be  independent  and  normally  distributed  with  mean  0  and  variance 
2 

a  .  It  is  assumed  that  the  T  observations  are  taken  sequentially  over 


time  and  it  is  desired  to  examine  the  variations  in  the  8^  over  time. 


Let  6  «(g  be  the  m  x  1  vector  of  least  squares 

1  j  mJ 

estimates  of  the  elements  of  the  vector  0,  obtained  from  the  j*"*1  set  of 

r) 

A 

n  observations  (n  <  T),  i.e.  is  estimated  from  the  first  n 

observations,  is  estimated  from  the  second  observation  to  the  (n+1) 

A 

observations,  etc.  From  each  jg^  plot  the  function  fg^(t),  defined  in  (1), 
over  the  range  -n  <  t  <  u  .  The  plots  of  these  functions  will  show  the 

A 

change  over  time  in  the  vector  of  coefficients  g_.  . 

Herzberg  and  Hickie  (1981)  consider  two  sets  of  data  using  polynomial  and 
Fourier  series  models  in  (2).  One  of  the  sets  of  data  has  a  cyclic  effect, 
the  other  having  cyclic  effect  plus  trend.  For  both  sets  of  data  it  was  known 
that  the  period  was  12  months.  It  could  also  be  seen  that  every  12th  plot  was 
similar. 

For  one  of  their  examples,  namely  the  monthly  mean  daily  air  temperatures 
CC)  at  sea  level  for  England  and  Hales  from  January  1970  to  December  1977  as 
published  by  the  Central  Statistical  Office  Monthly  Digest  of  Statistics 
(HMSO) ,  Herzberg  and  Hickie  (1981)  fitted  the  cubic  polynomial  model. 


■  6tj  *  V  +  V2  *  s4, l3' 


by  least  squares.  Here  yj+^_^  is  the  observed  temperature  in  mouth  j  +  i-1 

A  A  A  A  A 

For  each  j  in  (3)  fixed,  i  =  1,»»»,12,  0.  =  (0  , 0  , ,0  . ,0  }'  the 

~3  *3  ^3  *13  43 

least  squares  estimate  of  0_^  was  obtained  (j  =  1,«*«,85).  Figure  1  shows 
the  resulting  Andrews'  plots.  The  plots  in  Figure  l.k  are  those  obtained 
from  g^  (j  =  k,  k+12,  k+24,  k+48,  k+60,  k+72,  k+84?  j  4  85).  It  can  be  seen 
that  the  plots  in  each  of  the  Figures  l.k  are  similar. 
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In  situations  where  the  period  is  unknown,  Andrews'  plots  may  be  plotted 
for  several  values  of  n  in  order  to  determine  similarities  and,  therefore. 


the  length  of  the  period. 

Because  of  their  mathematical  and  resulting  statistical  properties, 
Andrews'  plots  can  be  used  as  a  tool  for  finding  outliers  in  a  time  series. 
Work  is  at  present  being  done  on  this  and  on  using  Andrews'  plots  as  a 
sequential  graphical  method  for  discriminating  among  models. 


the  monthly  mean  daily  air  temperature  (®C)  at  sea  level  for  England  and  Wales, 
January  1970  to  December  1977;  Figure  l.k  (k  =  1,»**,12)  consists  of  Andrews'  plots 
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